Import all the necessary libraries.

InΒ [Β ]:
from tensorflow import keras
from tensorflow.keras import layers
import pathlib
from tensorflow.keras.utils import image_dataset_from_directory

import pandas as pd
import pathlib
from pathlib import Path

import numpy as np
import pandas as pd

# plotting modules
from matplotlib import pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split

import plotly as plotly
plotly.offline.init_notebook_mode()

from tensorflow import keras
from tensorflow.keras import layers

import tensorflow as tf
from keras.utils import to_categorical
from keras.models import load_model

import plotly.graph_objects as go
from tensorflow.keras.models import Sequential
from keras.callbacks import ModelCheckpoint
from tensorflow.keras.layers import Dense
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, classification_report, precision_recall_curve, ConfusionMatrixDisplay

Framing the problemΒΆ

The goal of this lab is to work through a common practice of Deep Learning Engineers - that is - take an existing model, that does something similar to what the engineer is interested doing, and fine-tune it for the specific task at-hand.

Getting the dataΒΆ

For this report, we will be downloading the data from Kaggle.

InΒ [Β ]:
data_folder = pathlib.Path('../../CSCN8010-Foundations-of-Machine-Learning/data/dogs-vs-cats-small')

We will use the below function and code to get the image datasets from the folder directory.

InΒ [Β ]:
train_dataset = image_dataset_from_directory(
    data_folder / "train",
    image_size=(180, 180),
    batch_size=32)
validation_dataset = image_dataset_from_directory(
    data_folder / "validation",
    image_size=(180, 180),
    batch_size=32)
test_dataset = image_dataset_from_directory(
    data_folder / "test",
    image_size=(180, 180),
    batch_size=32)
Found 2000 files belonging to 2 classes.
Found 1000 files belonging to 2 classes.
Found 2000 files belonging to 2 classes.
InΒ [Β ]:
test_dataset
Out[Β ]:
<_BatchDataset element_spec=(TensorSpec(shape=(None, 180, 180, 3), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.int32, name=None))>

Then the function and the code in the block below will covert the datasets into features and labels that we can now interact with.

We will talking about the conv_base later. But for now, we need it to extract our features and labels.

InΒ [Β ]:
conv_base = keras.applications.vgg16.VGG16(
    weights="imagenet",
    include_top=False,
    input_shape=(180, 180, 3))
InΒ [Β ]:
import numpy as np

def get_features_and_labels(dataset):
    all_features = []
    all_labels = []
    for images, labels in dataset:
        preprocessed_images = keras.applications.vgg16.preprocess_input(images)
        features = conv_base.predict(preprocessed_images)
        all_features.append(features)
        all_labels.append(labels)
    return np.concatenate(all_features), np.concatenate(all_labels)

train_features, train_labels =  get_features_and_labels(train_dataset)
val_features, val_labels =  get_features_and_labels(validation_dataset)
test_features, test_labels =  get_features_and_labels(test_dataset)
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 1s 877ms/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 1s 1s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 1s 790ms/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 4s 4s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 4s 4s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 4s 4s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 4s 4s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 4s 4s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 4s 4s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 2s 2s/step
InΒ [Β ]:
train_features.shape, val_features.shape, test_features.shape
Out[Β ]:
((2000, 5, 5, 512), (1000, 5, 5, 512), (2000, 5, 5, 512))
InΒ [Β ]:
train_labels
Out[Β ]:
array([1, 0, 1, ..., 1, 0, 1])

Exploratory Data AnalysisΒΆ

Let us first put our data into a usable format we can explore.

InΒ [Β ]:
def create_image_dataframe(base_folder, dataset_type):
    # Initialize a list to store the information
    data = []
    dataset_folder = base_folder / dataset_type  # e.g., 'train', 'validation', 'test'
    
    # Iterate through each subfolder ('cat' and 'dog') in the given directory
    for subfolder in ['cat', 'dog']:
        full_subfolder_path = dataset_folder / subfolder
        
        # Iterate through each image file in the subfolder
        for image_path in full_subfolder_path.glob('*.jpg'):
            # Construct a concise relative path in the format "train/cat"
            concise_path = f"{dataset_type}/{subfolder}/{image_path.name}"
            
            # Extract the image name (file name without the extension)
            image_name = image_path.stem
            
            # The label is determined by the subfolder name
            label = subfolder
            
            # Append the information to the list
            data.append({'ImagePath': concise_path, 'ImageName': image_name, 'Label': label})
    
    # Create a DataFrame from the list
    return pd.DataFrame(data)

# Define the base path for the data
data_folder = Path('../../CSCN8010-Foundations-of-Machine-Learning/data/dogs-vs-cats-small')

# Create DataFrames for train, validation, and test sets
df_train = create_image_dataframe(data_folder, 'train')
df_validation = create_image_dataframe(data_folder, 'validation')
df_test = create_image_dataframe(data_folder, 'test')

# Display the first few rows of each DataFrame as a sanity check
print("Training Set:")
display(df_train.head())

print("\nValidation Set:")
display(df_validation.head())

print("\nTest Set:")
display(df_test.head())
Training Set:
ImagePath ImageName Label
0 train/cat/cat.0.jpg cat.0 cat
1 train/cat/cat.1.jpg cat.1 cat
2 train/cat/cat.10.jpg cat.10 cat
3 train/cat/cat.100.jpg cat.100 cat
4 train/cat/cat.101.jpg cat.101 cat
Validation Set:
ImagePath ImageName Label
0 validation/cat/cat.1000.jpg cat.1000 cat
1 validation/cat/cat.1001.jpg cat.1001 cat
2 validation/cat/cat.1002.jpg cat.1002 cat
3 validation/cat/cat.1003.jpg cat.1003 cat
4 validation/cat/cat.1004.jpg cat.1004 cat
Test Set:
ImagePath ImageName Label
0 test/cat/cat.1500.jpg cat.1500 cat
1 test/cat/cat.1501.jpg cat.1501 cat
2 test/cat/cat.1502.jpg cat.1502 cat
3 test/cat/cat.1503.jpg cat.1503 cat
4 test/cat/cat.1504.jpg cat.1504 cat
InΒ [Β ]:
df_train['Set'] = 'Train'
df_validation['Set'] = 'Validation'
df_test['Set'] = 'Test'

# Concatenate the DataFrames
df_merged = pd.concat([df_train, df_validation, df_test], ignore_index=True)
InΒ [Β ]:
from PIL import Image


def plot_cat_images(df, base_folder, total_images=20, images_per_row=5):
    cat_df = df[df['Label'] == 'cat']
    plt.figure(figsize=(40, 40))
    
    for i in range(total_images):
        random_row = cat_df.sample(n=1).iloc[0]
        image_path = base_folder / random_row['ImagePath']
        
        image = Image.open(image_path)
        
        total_rows = total_images // images_per_row + int(total_images % images_per_row != 0)
        
        plt.subplot(total_rows, images_per_row, i+1)
        plt.imshow(image)
        plt.axis('off') 
    
    plt.subplots_adjust(wspace=0, hspace=0)
    plt.tight_layout(pad=0)
    plt.show()

plot_cat_images(df_merged, data_folder)
No description has been provided for this image
InΒ [Β ]:
def plot_dog_images(df, base_folder, total_images=20, images_per_row=5):
    dog_df = df[df['Label'] == 'dog']
    plt.figure(figsize=(40, 40))
    
    for i in range(total_images):
        random_row = dog_df.sample(n=1).iloc[0]
        image_path = base_folder / random_row['ImagePath']
        
        image = Image.open(image_path)
        
        total_rows = total_images // images_per_row + int(total_images % images_per_row != 0)
        
        plt.subplot(total_rows, images_per_row, i+1)
        plt.imshow(image)
        plt.axis('off') 
    
    plt.subplots_adjust(wspace=0, hspace=0)
    plt.tight_layout(pad=0)
    plt.show()

plot_dog_images(df_merged, data_folder)
No description has been provided for this image
InΒ [Β ]:
import plotly.express as px

# Count the occurrences of each label in the dataset
label_counts = df_train['Label'].value_counts()

# Create a bar chart
fig = px.bar(x=label_counts.index, y=label_counts.values, labels={'x': 'Label', 'y': 'Count'}, title='Distribution of Cats and Dogs in the Training Dataset')

# Customize the chart appearance
fig.update_traces(marker_color=['blue', 'orange'], marker_line_color='rgb(8,48,107)', marker_line_width=1.5, opacity=0.6)

# Show the plot
fig.show()
InΒ [Β ]:
import plotly.express as px

# Count the occurrences of each label in the dataset
label_counts = df_validation['Label'].value_counts()

# Create a bar chart
fig = px.bar(x=label_counts.index, y=label_counts.values, labels={'x': 'Label', 'y': 'Count'}, title='Distribution of Cats and Dogs in the Validation Dataset')

# Customize the chart appearance
fig.update_traces(marker_color=['blue', 'orange'], marker_line_color='rgb(8,48,107)', marker_line_width=1.5, opacity=0.6)

# Show the plot
fig.show()
InΒ [Β ]:
import plotly.express as px

# Count the occurrences of each label in the dataset
label_counts = df_test['Label'].value_counts()

# Create a bar chart
fig = px.bar(x=label_counts.index, y=label_counts.values, labels={'x': 'Label', 'y': 'Count'}, title='Distribution of Cats and Dogs in the Test Dataset')

# Customize the chart appearance
fig.update_traces(marker_color=['blue', 'orange'], marker_line_color='rgb(8,48,107)', marker_line_width=1.5, opacity=0.6)

# Show the plot
fig.show()

We have an equal and balanced distribution of cat and dog in our datasets. And this is in the train, validation and test datasets. This would help in training our model.

Next we explore the distribution of the image width and height of our combined dataset.

InΒ [Β ]:
from PIL import Image

# Function to get image dimensions
def get_image_dimensions(image_path):
    with Image.open(image_path) as img:
        return img.size  # width, height

# Apply the function to the DataFrame
df_merged['Dimensions'] = df_merged['ImagePath'].apply(lambda x: get_image_dimensions(data_folder / x))

# Split dimensions into separate columns
df_merged['Width'], df_merged['Height'] = zip(*df_merged['Dimensions'])

# Plot distributions of widths and heights
fig_widths = px.histogram(df_merged, x='Width', title='Distribution of Image Widths')
fig_heights = px.histogram(df_merged, x='Height', title='Distribution of Image Heights')

fig_widths.show()
fig_heights.show()

Looking at the image width, we can see that most of the images are within the 250 to 450 pixels. We also have about 40% of the images that are 490 to 510 pixels wide and then a couple of outliers at 1020 and 1050 pixels.

Similarly for the image height, we have most of our image height from 200px to 510px, with a large number between 370 to 379 pixels and then another group at 490px to 510px. Also, there is the presence of some outliers at 760px.

InΒ [Β ]:
# Basic counts
total_images = len(df_merged)
num_cats = len(df_merged[df_merged['Label'] == 'cat'])
num_dogs = len(df_merged[df_merged['Label'] == 'dog'])

print(f"Total images: {total_images}")
print(f"Number of cat images: {num_cats}")
print(f"Number of dog images: {num_dogs}")

# Proportions
prop_cats = num_cats / total_images * 100
prop_dogs = num_dogs / total_images * 100

print(f"Proportion of cat images: {prop_cats:.2f}%")
print(f"Proportion of dog images: {prop_dogs:.2f}%")
Total images: 5000
Number of cat images: 2500
Number of dog images: 2500
Proportion of cat images: 50.00%
Proportion of dog images: 50.00%

Here we can also see it in terms of numbers that we have an evenly distributed dataset between our two classes.

InΒ [Β ]:
fig = px.pie(names=df_merged['Label'].value_counts().index, values=df_merged['Label'].value_counts().values, title='Class Distribution in the Dataset')
fig.show()

As viewed on a pie chart.

InΒ [Β ]:
from PIL import Image

# Function to get image dimensions
def get_image_dimensions(image_path):
    with Image.open(image_path) as img:
        return img.size  # Returns (width, height)

# Apply the function to the dataset to create new columns for width and height
df_merged['Dimensions'] = df_merged['ImagePath'].apply(lambda x: get_image_dimensions(data_folder / x))
df_merged['Width'], df_merged['Height'] = zip(*df_merged['Dimensions'])

# Descriptive statistics for image dimensions
print(df_merged[['Width', 'Height']].describe())
             Width       Height
count  5000.000000  5000.000000
mean    403.882600   360.564600
std     108.659451    96.611042
min      59.000000    41.000000
25%     322.000000   300.000000
50%     440.000000   374.000000
75%     499.000000   418.250000
max    1050.000000   768.000000

Below we have the intensity value of the RGB aof the images across our entire dataset.

InΒ [Β ]:
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image

def plot_rgb_distribution(df, base_folder, sample_size=5000):
    # Sample a subset of the DataFrame
    sampled_df = df.sample(n=sample_size, random_state=42)

    # Initialize lists to store RGB values
    r_values, g_values, b_values = [], [], []

    # Loop through the sampled DataFrame and extract RGB values
    for _, row in sampled_df.iterrows():
        image_path = base_folder / row['ImagePath']
        with Image.open(image_path) as img:
            img = img.resize((100, 100))  # Resize to reduce computation
            r, g, b = np.array(img).T  # Transpose to split RGB
            r_values.extend(r.flatten())
            g_values.extend(g.flatten())
            b_values.extend(b.flatten())

    # Plot histograms for R, G, B distributions
    plt.figure(figsize=(20, 6))

    plt.subplot(1, 3, 1)
    plt.hist(r_values, bins=256, color='red', alpha=0.7)
    plt.title('Red Channel Distribution')
    plt.xlabel('Intensity Value')
    plt.ylabel('Frequency')

    plt.subplot(1, 3, 2)
    plt.hist(g_values, bins=256, color='green', alpha=0.7)
    plt.title('Green Channel Distribution')
    plt.xlabel('Intensity Value')

    plt.subplot(1, 3, 3)
    plt.hist(b_values, bins=256, color='blue', alpha=0.7)
    plt.title('Blue Channel Distribution')
    plt.xlabel('Intensity Value')

    plt.tight_layout()
    plt.show()

# Assuming df_merged and data_folder are already defined
plot_rgb_distribution(df_merged, data_folder)
No description has been provided for this image

We can also visualize our dataset using the aggregated RGB values and see the pixel distribution.

InΒ [Β ]:
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image

def compute_average_rgb_distribution(df, base_folder, sample_size=100):
    # Initialize lists to store RGB values
    red_values, green_values, blue_values = [], [], []
    
    # Sample a subset of images to avoid memory issues
    sampled_df = df.sample(n=sample_size)
    
    for _, row in sampled_df.iterrows():
        image_path = base_folder / row['ImagePath']
        image = Image.open(image_path)
        image = image.resize((100, 100))  # Resize to standardize size
        image_np = np.array(image)
        
        # Append RGB values
        red_values.append(image_np[:, :, 0].flatten())
        green_values.append(image_np[:, :, 1].flatten())
        blue_values.append(image_np[:, :, 2].flatten())
    
    # Convert lists to Numpy arrays and compute mean across all sampled images
    red_values = np.mean(np.array(red_values), axis=0)
    green_values = np.mean(np.array(green_values), axis=0)
    blue_values = np.mean(np.array(blue_values), axis=0)
    
    # Plot the distributions
    fig, axes = plt.subplots(1, 3, figsize=(18, 5))
    colors = ['Red', 'Green', 'Blue']
    for i, (data, color) in enumerate(zip([red_values, green_values, blue_values], colors)):
        axes[i].hist(data, bins=50, color=color.lower(), edgecolor='black')
        axes[i].set_title(f'Average {color} Pixel Value Distribution')
        axes[i].set_xlabel(f'{color} Pixel Value')
        axes[i].set_ylabel('Frequency')

    plt.tight_layout()
    plt.show()

# Assuming df_merged and data_folder are already defined
compute_average_rgb_distribution(df_merged, data_folder)
No description has been provided for this image

Our dataset shows a normal distribution across each of the Red, Green and Blue values in our images.

ModelingΒΆ

Model 1 - Vanilla ModelΒΆ

In this model, we will be defining a model of our choice, from scratch.

We will be using a neural dense network of only two trainable layers for our modeling. Simple stuff, really. Nothing complicated.

InΒ [Β ]:
inputs = keras.Input(shape=(5, 5, 512))
x = layers.Flatten()(inputs)
x = layers.Dense(256)(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model_1 = keras.Model(inputs, outputs)
InΒ [Β ]:
model_1.summary()
Model: "model_15"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_28 (InputLayer)       [(None, 5, 5, 512)]       0         
                                                                 
 flatten_15 (Flatten)        (None, 12800)             0         
                                                                 
 dense_32 (Dense)            (None, 256)               3277056   
                                                                 
 dropout_16 (Dropout)        (None, 256)               0         
                                                                 
 dense_33 (Dense)            (None, 1)                 257       
                                                                 
=================================================================
Total params: 3,277,313
Trainable params: 3,277,313
Non-trainable params: 0
_________________________________________________________________
InΒ [Β ]:
model_1.compile(loss="binary_crossentropy",
              optimizer="rmsprop",
              metrics=["accuracy"])

callbacks = [
    keras.callbacks.ModelCheckpoint(
      filepath="./models/vanilla.keras",
      save_best_only=True,
      monitor="val_loss")
]
history_1 = model_1.fit(
    train_features, train_labels,
    epochs=30,
    validation_data=(val_features, val_labels),
    callbacks=callbacks)
Epoch 1/30
63/63 [==============================] - 3s 38ms/step - loss: 12.2970 - accuracy: 0.9305 - val_loss: 4.8654 - val_accuracy: 0.9620
Epoch 2/30
63/63 [==============================] - 2s 30ms/step - loss: 4.4486 - accuracy: 0.9725 - val_loss: 3.1877 - val_accuracy: 0.9740
Epoch 3/30
63/63 [==============================] - 2s 29ms/step - loss: 1.6767 - accuracy: 0.9850 - val_loss: 3.3061 - val_accuracy: 0.9750
Epoch 4/30
63/63 [==============================] - 2s 32ms/step - loss: 1.9543 - accuracy: 0.9910 - val_loss: 6.1227 - val_accuracy: 0.9720
Epoch 5/30
63/63 [==============================] - 2s 30ms/step - loss: 0.8609 - accuracy: 0.9930 - val_loss: 5.5530 - val_accuracy: 0.9660
Epoch 6/30
63/63 [==============================] - 2s 29ms/step - loss: 2.2027 - accuracy: 0.9880 - val_loss: 4.9896 - val_accuracy: 0.9790
Epoch 7/30
63/63 [==============================] - 2s 29ms/step - loss: 0.4942 - accuracy: 0.9960 - val_loss: 6.5941 - val_accuracy: 0.9770
Epoch 8/30
63/63 [==============================] - 2s 29ms/step - loss: 0.3218 - accuracy: 0.9970 - val_loss: 5.4692 - val_accuracy: 0.9800
Epoch 9/30
63/63 [==============================] - 2s 31ms/step - loss: 0.2090 - accuracy: 0.9985 - val_loss: 4.7763 - val_accuracy: 0.9790
Epoch 10/30
63/63 [==============================] - 2s 29ms/step - loss: 0.2012 - accuracy: 0.9970 - val_loss: 4.3617 - val_accuracy: 0.9810
Epoch 11/30
63/63 [==============================] - 2s 29ms/step - loss: 0.5041 - accuracy: 0.9960 - val_loss: 5.8818 - val_accuracy: 0.9810
Epoch 12/30
63/63 [==============================] - 2s 30ms/step - loss: 0.1468 - accuracy: 0.9980 - val_loss: 6.5429 - val_accuracy: 0.9740
Epoch 13/30
63/63 [==============================] - 2s 33ms/step - loss: 0.0273 - accuracy: 0.9995 - val_loss: 4.6216 - val_accuracy: 0.9830
Epoch 14/30
63/63 [==============================] - 2s 32ms/step - loss: 0.1678 - accuracy: 0.9980 - val_loss: 5.3051 - val_accuracy: 0.9820
Epoch 15/30
63/63 [==============================] - 2s 32ms/step - loss: 0.0838 - accuracy: 0.9990 - val_loss: 5.2004 - val_accuracy: 0.9790
Epoch 16/30
63/63 [==============================] - 2s 32ms/step - loss: 0.0000e+00 - accuracy: 1.0000 - val_loss: 5.2004 - val_accuracy: 0.9790
Epoch 17/30
63/63 [==============================] - 2s 33ms/step - loss: 0.0145 - accuracy: 0.9990 - val_loss: 5.3837 - val_accuracy: 0.9810
Epoch 18/30
63/63 [==============================] - 2s 31ms/step - loss: 0.2898 - accuracy: 0.9970 - val_loss: 5.7486 - val_accuracy: 0.9810
Epoch 19/30
63/63 [==============================] - 2s 33ms/step - loss: 0.0655 - accuracy: 0.9990 - val_loss: 5.7090 - val_accuracy: 0.9800
Epoch 20/30
63/63 [==============================] - 2s 33ms/step - loss: 0.0361 - accuracy: 0.9995 - val_loss: 7.3197 - val_accuracy: 0.9770
Epoch 21/30
63/63 [==============================] - 2s 31ms/step - loss: 0.0617 - accuracy: 0.9990 - val_loss: 7.7944 - val_accuracy: 0.9740
Epoch 22/30
63/63 [==============================] - 2s 31ms/step - loss: 0.2232 - accuracy: 0.9970 - val_loss: 7.9625 - val_accuracy: 0.9740
Epoch 23/30
63/63 [==============================] - 2s 30ms/step - loss: 0.1044 - accuracy: 0.9995 - val_loss: 6.3751 - val_accuracy: 0.9740
Epoch 24/30
63/63 [==============================] - 2s 32ms/step - loss: 0.0904 - accuracy: 0.9985 - val_loss: 7.9347 - val_accuracy: 0.9730
Epoch 25/30
63/63 [==============================] - 2s 31ms/step - loss: 0.1744 - accuracy: 0.9980 - val_loss: 8.2863 - val_accuracy: 0.9770
Epoch 26/30
63/63 [==============================] - 2s 32ms/step - loss: 0.1326 - accuracy: 0.9985 - val_loss: 5.5675 - val_accuracy: 0.9840
Epoch 27/30
63/63 [==============================] - 2s 34ms/step - loss: 0.0659 - accuracy: 0.9990 - val_loss: 5.5832 - val_accuracy: 0.9810
Epoch 28/30
63/63 [==============================] - 2s 32ms/step - loss: 7.1576e-24 - accuracy: 1.0000 - val_loss: 5.5832 - val_accuracy: 0.9810
Epoch 29/30
63/63 [==============================] - 2s 29ms/step - loss: 0.1254 - accuracy: 0.9990 - val_loss: 6.2272 - val_accuracy: 0.9800
Epoch 30/30
63/63 [==============================] - 2s 34ms/step - loss: 0.1675 - accuracy: 0.9990 - val_loss: 5.8680 - val_accuracy: 0.9760
InΒ [Β ]:
history_df = pd.DataFrame(history_1.history)
history_df.insert(0, 'epoch', range(1, len(history_df) + 1))
history_df
Out[Β ]:
epoch loss accuracy val_loss val_accuracy
0 1 1.229699e+01 0.9305 4.865433 0.962
1 2 4.448617e+00 0.9725 3.187708 0.974
2 3 1.676736e+00 0.9850 3.306071 0.975
3 4 1.954285e+00 0.9910 6.122653 0.972
4 5 8.609466e-01 0.9930 5.553020 0.966
5 6 2.202713e+00 0.9880 4.989601 0.979
6 7 4.942128e-01 0.9960 6.594050 0.977
7 8 3.217583e-01 0.9970 5.469166 0.980
8 9 2.089501e-01 0.9985 4.776286 0.979
9 10 2.012045e-01 0.9970 4.361650 0.981
10 11 5.041406e-01 0.9960 5.881827 0.981
11 12 1.467834e-01 0.9980 6.542856 0.974
12 13 2.730624e-02 0.9995 4.621624 0.983
13 14 1.678134e-01 0.9980 5.305068 0.982
14 15 8.384467e-02 0.9990 5.200406 0.979
15 16 0.000000e+00 1.0000 5.200406 0.979
16 17 1.454370e-02 0.9990 5.383736 0.981
17 18 2.897931e-01 0.9970 5.748642 0.981
18 19 6.546699e-02 0.9990 5.709039 0.980
19 20 3.610600e-02 0.9995 7.319722 0.977
20 21 6.172859e-02 0.9990 7.794371 0.974
21 22 2.231980e-01 0.9970 7.962476 0.974
22 23 1.043946e-01 0.9995 6.375087 0.974
23 24 9.037035e-02 0.9985 7.934676 0.973
24 25 1.744076e-01 0.9980 8.286255 0.977
25 26 1.326382e-01 0.9985 5.567520 0.984
26 27 6.592563e-02 0.9990 5.583184 0.981
27 28 7.157617e-24 1.0000 5.583184 0.981
28 29 1.253546e-01 0.9990 6.227195 0.980
29 30 1.675194e-01 0.9990 5.868022 0.976
InΒ [Β ]:
# Create a DataFrame from the history object
history_df = pd.DataFrame(history_1.history)

# Plot the training and validation loss
plt.figure(figsize=(9, 5))
values = history_df['accuracy']
epochs = range(1, len(values) + 1)
plt.plot(epochs, history_df['loss'], 'bo', label='Training loss')
plt.plot(epochs, history_df['val_loss'], 'ro', label='Validation loss')

plt.xlabel('Epochs')
plt.xticks(epochs)
plt.ylabel('Loss')
plt.legend()
plt.title('Training and validation loss')
plt.show()

# Plot the training and validation accuracy
plt.figure(figsize=(9, 5))
plt.plot(epochs, history_df['accuracy'], 'bo', label='Training accuracy')
plt.plot(epochs, history_df['val_accuracy'], 'ro', label='Validation accuracy')

plt.xlabel('Epochs')
plt.xticks(epochs)
plt.ylabel('Accuracy')
plt.legend()
plt.title('Training and validation accuracy')
plt.show()
No description has been provided for this image
No description has been provided for this image
InΒ [Β ]:
# Create a DataFrame from the history object
history_df = pd.DataFrame(history_1.history)

epochs = list(range(1, len(history_df['loss']) + 1))

# Plot the training and validation loss
fig = go.Figure()
fig.add_trace(go.Scatter(x=epochs, y=history_df['loss'], mode='lines+markers', name='Training loss'))
fig.add_trace(go.Scatter(x=epochs, y=history_df['val_loss'], mode='lines+markers', name='Validation loss'))
fig.update_layout(title='Training and validation loss', xaxis_title='Epochs', yaxis_title='Loss', xaxis=dict(tickvals=epochs))
fig.show()

# Plot the training and validation accuracy
fig = go.Figure()
fig.add_trace(go.Scatter(x=epochs, y=history_df['accuracy'], mode='lines+markers', name='Training accuracy'))
fig.add_trace(go.Scatter(x=epochs, y=history_df['val_accuracy'], mode='lines+markers', name='Validation accuracy'))
fig.update_layout(title='Training and validation accuracy', xaxis_title='Epochs', yaxis_title='Accuracy', xaxis=dict(tickvals=epochs))
fig.show()

We can see that on epoch 26, we attained the highest accuracy on our validation data at 0.984 and we had our lowest loss on the validation data at epoch 2, with 3.18.

This is the performance on the test set, we will dive into this later.

InΒ [Β ]:
best_model_1 = keras.models.load_model(
    "./models/vanilla.keras")
test_loss, test_acc = best_model_1.evaluate(x=test_features, y=test_labels)
print(f"Test accuracy: {test_acc:.3f}")
63/63 [==============================] - 1s 12ms/step - loss: 5.0883 - accuracy: 0.9700
Test accuracy: 0.970

Model 2 - Fine-Tune VGG16ΒΆ

For this model, we will be using the already pretrained VGG16 model.

As a way to fine tune this model for our case, we will be excluding the top layers from the VGG16 model because this model has already been trained with the ImageNet data but we are not working with the ImageNet data here.

InΒ [Β ]:
conv_base = keras.applications.vgg16.VGG16(
    weights="imagenet",
    include_top=False,
    )
InΒ [Β ]:
conv_base.summary()
Model: "vgg16"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_29 (InputLayer)       [(None, None, None, 3)]   0         
                                                                 
 block1_conv1 (Conv2D)       (None, None, None, 64)    1792      
                                                                 
 block1_conv2 (Conv2D)       (None, None, None, 64)    36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, None, None, 64)    0         
                                                                 
 block2_conv1 (Conv2D)       (None, None, None, 128)   73856     
                                                                 
 block2_conv2 (Conv2D)       (None, None, None, 128)   147584    
                                                                 
 block2_pool (MaxPooling2D)  (None, None, None, 128)   0         
                                                                 
 block3_conv1 (Conv2D)       (None, None, None, 256)   295168    
                                                                 
 block3_conv2 (Conv2D)       (None, None, None, 256)   590080    
                                                                 
 block3_conv3 (Conv2D)       (None, None, None, 256)   590080    
                                                                 
 block3_pool (MaxPooling2D)  (None, None, None, 256)   0         
                                                                 
 block4_conv1 (Conv2D)       (None, None, None, 512)   1180160   
                                                                 
 block4_conv2 (Conv2D)       (None, None, None, 512)   2359808   
                                                                 
 block4_conv3 (Conv2D)       (None, None, None, 512)   2359808   
                                                                 
 block4_pool (MaxPooling2D)  (None, None, None, 512)   0         
                                                                 
 block5_conv1 (Conv2D)       (None, None, None, 512)   2359808   
                                                                 
 block5_conv2 (Conv2D)       (None, None, None, 512)   2359808   
                                                                 
 block5_conv3 (Conv2D)       (None, None, None, 512)   2359808   
                                                                 
 block5_pool (MaxPooling2D)  (None, None, None, 512)   0         
                                                                 
=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
_________________________________________________________________
InΒ [Β ]:
conv_base.trainable = False
conv_base.summary()
Model: "vgg16"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_29 (InputLayer)       [(None, None, None, 3)]   0         
                                                                 
 block1_conv1 (Conv2D)       (None, None, None, 64)    1792      
                                                                 
 block1_conv2 (Conv2D)       (None, None, None, 64)    36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, None, None, 64)    0         
                                                                 
 block2_conv1 (Conv2D)       (None, None, None, 128)   73856     
                                                                 
 block2_conv2 (Conv2D)       (None, None, None, 128)   147584    
                                                                 
 block2_pool (MaxPooling2D)  (None, None, None, 128)   0         
                                                                 
 block3_conv1 (Conv2D)       (None, None, None, 256)   295168    
                                                                 
 block3_conv2 (Conv2D)       (None, None, None, 256)   590080    
                                                                 
 block3_conv3 (Conv2D)       (None, None, None, 256)   590080    
                                                                 
 block3_pool (MaxPooling2D)  (None, None, None, 256)   0         
                                                                 
 block4_conv1 (Conv2D)       (None, None, None, 512)   1180160   
                                                                 
 block4_conv2 (Conv2D)       (None, None, None, 512)   2359808   
                                                                 
 block4_conv3 (Conv2D)       (None, None, None, 512)   2359808   
                                                                 
 block4_pool (MaxPooling2D)  (None, None, None, 512)   0         
                                                                 
 block5_conv1 (Conv2D)       (None, None, None, 512)   2359808   
                                                                 
 block5_conv2 (Conv2D)       (None, None, None, 512)   2359808   
                                                                 
 block5_conv3 (Conv2D)       (None, None, None, 512)   2359808   
                                                                 
 block5_pool (MaxPooling2D)  (None, None, None, 512)   0         
                                                                 
=================================================================
Total params: 14,714,688
Trainable params: 0
Non-trainable params: 14,714,688
_________________________________________________________________

Here we will initialize our model, we have defined the kind of input it is to expect and the output. Notice the use of the sigmoid activation function here and only one output neuron for our binary classification.

InΒ [Β ]:
inputs = keras.Input(shape=(180, 180, 3))
x = keras.applications.vgg16.preprocess_input(inputs)
x = conv_base(x)
x = layers.Flatten()(x)
x = layers.Dense(256)(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model_2 = keras.Model(inputs, outputs)
InΒ [Β ]:
model_2.summary()
Model: "model_16"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_30 (InputLayer)       [(None, 180, 180, 3)]     0         
                                                                 
 tf.__operators__.getitem_11  (None, 180, 180, 3)      0         
  (SlicingOpLambda)                                              
                                                                 
 tf.nn.bias_add_11 (TFOpLamb  (None, 180, 180, 3)      0         
 da)                                                             
                                                                 
 vgg16 (Functional)          (None, None, None, 512)   14714688  
                                                                 
 flatten_16 (Flatten)        (None, 12800)             0         
                                                                 
 dense_34 (Dense)            (None, 256)               3277056   
                                                                 
 dropout_17 (Dropout)        (None, 256)               0         
                                                                 
 dense_35 (Dense)            (None, 1)                 257       
                                                                 
=================================================================
Total params: 17,992,001
Trainable params: 3,277,313
Non-trainable params: 14,714,688
_________________________________________________________________
InΒ [Β ]:
conv_base.summary()
Model: "vgg16"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_29 (InputLayer)       [(None, None, None, 3)]   0         
                                                                 
 block1_conv1 (Conv2D)       (None, None, None, 64)    1792      
                                                                 
 block1_conv2 (Conv2D)       (None, None, None, 64)    36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, None, None, 64)    0         
                                                                 
 block2_conv1 (Conv2D)       (None, None, None, 128)   73856     
                                                                 
 block2_conv2 (Conv2D)       (None, None, None, 128)   147584    
                                                                 
 block2_pool (MaxPooling2D)  (None, None, None, 128)   0         
                                                                 
 block3_conv1 (Conv2D)       (None, None, None, 256)   295168    
                                                                 
 block3_conv2 (Conv2D)       (None, None, None, 256)   590080    
                                                                 
 block3_conv3 (Conv2D)       (None, None, None, 256)   590080    
                                                                 
 block3_pool (MaxPooling2D)  (None, None, None, 256)   0         
                                                                 
 block4_conv1 (Conv2D)       (None, None, None, 512)   1180160   
                                                                 
 block4_conv2 (Conv2D)       (None, None, None, 512)   2359808   
                                                                 
 block4_conv3 (Conv2D)       (None, None, None, 512)   2359808   
                                                                 
 block4_pool (MaxPooling2D)  (None, None, None, 512)   0         
                                                                 
 block5_conv1 (Conv2D)       (None, None, None, 512)   2359808   
                                                                 
 block5_conv2 (Conv2D)       (None, None, None, 512)   2359808   
                                                                 
 block5_conv3 (Conv2D)       (None, None, None, 512)   2359808   
                                                                 
 block5_pool (MaxPooling2D)  (None, None, None, 512)   0         
                                                                 
=================================================================
Total params: 14,714,688
Trainable params: 0
Non-trainable params: 14,714,688
_________________________________________________________________

Another fine tuning we will do to the VGG16 model is to only train the last 4 layers and freeze the rest leaving them as they were.

Retraining all the layers would generate a new set of weights entirely and would take away the point of using a pretrained model. Also, if we just used all the same weights as they came, we would not be adapting it to our particular dataset.

InΒ [Β ]:
conv_base.trainable = True
for layer in conv_base.layers[:-4]:
    layer.trainable = False
InΒ [Β ]:
model_2.summary()
Model: "model_16"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_30 (InputLayer)       [(None, 180, 180, 3)]     0         
                                                                 
 tf.__operators__.getitem_11  (None, 180, 180, 3)      0         
  (SlicingOpLambda)                                              
                                                                 
 tf.nn.bias_add_11 (TFOpLamb  (None, 180, 180, 3)      0         
 da)                                                             
                                                                 
 vgg16 (Functional)          (None, None, None, 512)   14714688  
                                                                 
 flatten_16 (Flatten)        (None, 12800)             0         
                                                                 
 dense_34 (Dense)            (None, 256)               3277056   
                                                                 
 dropout_17 (Dropout)        (None, 256)               0         
                                                                 
 dense_35 (Dense)            (None, 1)                 257       
                                                                 
=================================================================
Total params: 17,992,001
Trainable params: 10,356,737
Non-trainable params: 7,635,264
_________________________________________________________________
InΒ [Β ]:
conv_base.summary()
Model: "vgg16"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_29 (InputLayer)       [(None, None, None, 3)]   0         
                                                                 
 block1_conv1 (Conv2D)       (None, None, None, 64)    1792      
                                                                 
 block1_conv2 (Conv2D)       (None, None, None, 64)    36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, None, None, 64)    0         
                                                                 
 block2_conv1 (Conv2D)       (None, None, None, 128)   73856     
                                                                 
 block2_conv2 (Conv2D)       (None, None, None, 128)   147584    
                                                                 
 block2_pool (MaxPooling2D)  (None, None, None, 128)   0         
                                                                 
 block3_conv1 (Conv2D)       (None, None, None, 256)   295168    
                                                                 
 block3_conv2 (Conv2D)       (None, None, None, 256)   590080    
                                                                 
 block3_conv3 (Conv2D)       (None, None, None, 256)   590080    
                                                                 
 block3_pool (MaxPooling2D)  (None, None, None, 256)   0         
                                                                 
 block4_conv1 (Conv2D)       (None, None, None, 512)   1180160   
                                                                 
 block4_conv2 (Conv2D)       (None, None, None, 512)   2359808   
                                                                 
 block4_conv3 (Conv2D)       (None, None, None, 512)   2359808   
                                                                 
 block4_pool (MaxPooling2D)  (None, None, None, 512)   0         
                                                                 
 block5_conv1 (Conv2D)       (None, None, None, 512)   2359808   
                                                                 
 block5_conv2 (Conv2D)       (None, None, None, 512)   2359808   
                                                                 
 block5_conv3 (Conv2D)       (None, None, None, 512)   2359808   
                                                                 
 block5_pool (MaxPooling2D)  (None, None, None, 512)   0         
                                                                 
=================================================================
Total params: 14,714,688
Trainable params: 7,079,424
Non-trainable params: 7,635,264
_________________________________________________________________

You can see the number of parameters we are training now out of all the total parameters.

InΒ [Β ]:
model_2.summary()
Model: "model_16"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_30 (InputLayer)       [(None, 180, 180, 3)]     0         
                                                                 
 tf.__operators__.getitem_11  (None, 180, 180, 3)      0         
  (SlicingOpLambda)                                              
                                                                 
 tf.nn.bias_add_11 (TFOpLamb  (None, 180, 180, 3)      0         
 da)                                                             
                                                                 
 vgg16 (Functional)          (None, None, None, 512)   14714688  
                                                                 
 flatten_16 (Flatten)        (None, 12800)             0         
                                                                 
 dense_34 (Dense)            (None, 256)               3277056   
                                                                 
 dropout_17 (Dropout)        (None, 256)               0         
                                                                 
 dense_35 (Dense)            (None, 1)                 257       
                                                                 
=================================================================
Total params: 17,992,001
Trainable params: 10,356,737
Non-trainable params: 7,635,264
_________________________________________________________________

You will notice the multiple printing of the model summary after each operation. This is a way for me to keep track of the trainable and non-trainable parameters.

Now, we have 10,356,737 parameters that we are trying to train for this model. We will now proceed to do that.

InΒ [Β ]:
model_2.compile(loss="binary_crossentropy",
              optimizer=keras.optimizers.RMSprop(learning_rate=1e-5),
              metrics=["accuracy"])

callbacks = [
    keras.callbacks.ModelCheckpoint(
        filepath="./models/finetune-vgg16.keras",
        save_best_only=True,
        monitor="val_loss")
]
history_2 = model_2.fit(
    train_dataset,
    epochs=30 ,
    validation_data=validation_dataset,
    callbacks=callbacks)
Epoch 1/30
63/63 [==============================] - 333s 5s/step - loss: 3.9673 - accuracy: 0.7900 - val_loss: 0.6739 - val_accuracy: 0.9310
Epoch 2/30
63/63 [==============================] - 349s 6s/step - loss: 0.8357 - accuracy: 0.9240 - val_loss: 0.4260 - val_accuracy: 0.9440
Epoch 3/30
63/63 [==============================] - 226s 4s/step - loss: 0.3510 - accuracy: 0.9555 - val_loss: 0.3907 - val_accuracy: 0.9480
Epoch 4/30
63/63 [==============================] - 175s 3s/step - loss: 0.1987 - accuracy: 0.9760 - val_loss: 0.3240 - val_accuracy: 0.9530
Epoch 5/30
63/63 [==============================] - 175s 3s/step - loss: 0.1396 - accuracy: 0.9825 - val_loss: 0.4348 - val_accuracy: 0.9490
Epoch 6/30
63/63 [==============================] - 177s 3s/step - loss: 0.0939 - accuracy: 0.9875 - val_loss: 0.3031 - val_accuracy: 0.9660
Epoch 7/30
63/63 [==============================] - 176s 3s/step - loss: 0.0501 - accuracy: 0.9900 - val_loss: 0.3591 - val_accuracy: 0.9580
Epoch 8/30
63/63 [==============================] - 174s 3s/step - loss: 0.0357 - accuracy: 0.9910 - val_loss: 0.3156 - val_accuracy: 0.9640
Epoch 9/30
63/63 [==============================] - 171s 3s/step - loss: 0.0096 - accuracy: 0.9975 - val_loss: 0.3155 - val_accuracy: 0.9630
Epoch 10/30
63/63 [==============================] - 171s 3s/step - loss: 0.0331 - accuracy: 0.9940 - val_loss: 0.3937 - val_accuracy: 0.9620
Epoch 11/30
63/63 [==============================] - 172s 3s/step - loss: 0.0308 - accuracy: 0.9955 - val_loss: 0.3326 - val_accuracy: 0.9660
Epoch 12/30
63/63 [==============================] - 173s 3s/step - loss: 0.0092 - accuracy: 0.9985 - val_loss: 0.3666 - val_accuracy: 0.9660
Epoch 13/30
63/63 [==============================] - 173s 3s/step - loss: 0.0288 - accuracy: 0.9965 - val_loss: 0.3669 - val_accuracy: 0.9640
Epoch 14/30
63/63 [==============================] - 173s 3s/step - loss: 0.0070 - accuracy: 0.9975 - val_loss: 0.4195 - val_accuracy: 0.9650
Epoch 15/30
63/63 [==============================] - 172s 3s/step - loss: 0.0146 - accuracy: 0.9975 - val_loss: 0.4164 - val_accuracy: 0.9640
Epoch 16/30
63/63 [==============================] - 175s 3s/step - loss: 0.0173 - accuracy: 0.9980 - val_loss: 0.4595 - val_accuracy: 0.9630
Epoch 17/30
63/63 [==============================] - 173s 3s/step - loss: 0.0288 - accuracy: 0.9980 - val_loss: 0.4135 - val_accuracy: 0.9640
Epoch 18/30
63/63 [==============================] - 172s 3s/step - loss: 9.6472e-04 - accuracy: 0.9995 - val_loss: 0.4394 - val_accuracy: 0.9650
Epoch 19/30
63/63 [==============================] - 174s 3s/step - loss: 0.0064 - accuracy: 0.9990 - val_loss: 0.4073 - val_accuracy: 0.9670
Epoch 20/30
63/63 [==============================] - 172s 3s/step - loss: 0.0064 - accuracy: 0.9990 - val_loss: 0.3776 - val_accuracy: 0.9700
Epoch 21/30
63/63 [==============================] - 172s 3s/step - loss: 0.0067 - accuracy: 0.9995 - val_loss: 0.3738 - val_accuracy: 0.9700
Epoch 22/30
63/63 [==============================] - 171s 3s/step - loss: 0.0027 - accuracy: 0.9990 - val_loss: 0.3610 - val_accuracy: 0.9730
Epoch 23/30
63/63 [==============================] - 173s 3s/step - loss: 0.0068 - accuracy: 0.9990 - val_loss: 0.4539 - val_accuracy: 0.9640
Epoch 24/30
63/63 [==============================] - 171s 3s/step - loss: 0.0044 - accuracy: 0.9995 - val_loss: 0.3806 - val_accuracy: 0.9680
Epoch 25/30
63/63 [==============================] - 171s 3s/step - loss: 0.0064 - accuracy: 0.9985 - val_loss: 0.3723 - val_accuracy: 0.9690
Epoch 26/30
63/63 [==============================] - 172s 3s/step - loss: 2.9788e-05 - accuracy: 1.0000 - val_loss: 0.3737 - val_accuracy: 0.9680
Epoch 27/30
63/63 [==============================] - 174s 3s/step - loss: 0.0072 - accuracy: 0.9990 - val_loss: 0.3541 - val_accuracy: 0.9660
Epoch 28/30
63/63 [==============================] - 173s 3s/step - loss: 0.0018 - accuracy: 0.9990 - val_loss: 0.3541 - val_accuracy: 0.9740
Epoch 29/30
63/63 [==============================] - 172s 3s/step - loss: 2.9186e-05 - accuracy: 1.0000 - val_loss: 0.3403 - val_accuracy: 0.9690
Epoch 30/30
63/63 [==============================] - 172s 3s/step - loss: 0.0016 - accuracy: 0.9995 - val_loss: 0.3480 - val_accuracy: 0.9720
InΒ [Β ]:
history_df = pd.DataFrame(history_2.history)
history_df.insert(0, 'epoch', range(1, len(history_df) + 1))
history_df
Out[Β ]:
epoch loss accuracy val_loss val_accuracy
0 1 3.967278 0.7900 0.673886 0.931
1 2 0.835655 0.9240 0.425955 0.944
2 3 0.350983 0.9555 0.390702 0.948
3 4 0.198697 0.9760 0.324017 0.953
4 5 0.139607 0.9825 0.434827 0.949
5 6 0.093875 0.9875 0.303122 0.966
6 7 0.050138 0.9900 0.359109 0.958
7 8 0.035728 0.9910 0.315623 0.964
8 9 0.009566 0.9975 0.315540 0.963
9 10 0.033070 0.9940 0.393651 0.962
10 11 0.030840 0.9955 0.332614 0.966
11 12 0.009177 0.9985 0.366628 0.966
12 13 0.028793 0.9965 0.366944 0.964
13 14 0.007045 0.9975 0.419473 0.965
14 15 0.014559 0.9975 0.416413 0.964
15 16 0.017318 0.9980 0.459482 0.963
16 17 0.028846 0.9980 0.413502 0.964
17 18 0.000965 0.9995 0.439376 0.965
18 19 0.006430 0.9990 0.407346 0.967
19 20 0.006387 0.9990 0.377560 0.970
20 21 0.006714 0.9995 0.373765 0.970
21 22 0.002674 0.9990 0.360964 0.973
22 23 0.006838 0.9990 0.453943 0.964
23 24 0.004358 0.9995 0.380644 0.968
24 25 0.006367 0.9985 0.372264 0.969
25 26 0.000030 1.0000 0.373740 0.968
26 27 0.007171 0.9990 0.354063 0.966
27 28 0.001818 0.9990 0.354124 0.974
28 29 0.000029 1.0000 0.340299 0.969
29 30 0.001609 0.9995 0.347990 0.972
InΒ [Β ]:
history_df = pd.DataFrame(history_2.history)

# Plot the training and validation loss
plt.figure(figsize=(9, 5))
values = history_df['accuracy']
epochs = range(1, len(values) + 1)
plt.plot(epochs, history_df['loss'], 'bo', label='Training loss')
plt.plot(epochs, history_df['val_loss'], 'ro', label='Validation loss')

plt.xlabel('Epochs')
plt.xticks(epochs)
plt.ylabel('Loss')
plt.legend()
plt.title('Training and validation loss')
plt.show()

# Plot the training and validation accuracy
plt.figure(figsize=(9, 5))
plt.plot(epochs, history_df['accuracy'], 'bo', label='Training accuracy')
plt.plot(epochs, history_df['val_accuracy'], 'ro', label='Validation accuracy')

plt.xlabel('Epochs')
plt.xticks(epochs)
plt.ylabel('Accuracy')
plt.legend()
plt.title('Training and validation accuracy')
plt.show()
No description has been provided for this image
No description has been provided for this image
InΒ [Β ]:
# Create a DataFrame from the history object
history_df = pd.DataFrame(history_2.history)

epochs = list(range(1, len(history_df['loss']) + 1))

# Plot the training and validation loss
fig = go.Figure()
fig.add_trace(go.Scatter(x=epochs, y=history_df['loss'], mode='lines+markers', name='Training loss'))
fig.add_trace(go.Scatter(x=epochs, y=history_df['val_loss'], mode='lines+markers', name='Validation loss'))
fig.update_layout(title='Training and validation loss', xaxis_title='Epochs', yaxis_title='Loss', xaxis=dict(tickvals=epochs))
fig.show()

# Plot the training and validation accuracy
fig = go.Figure()
fig.add_trace(go.Scatter(x=epochs, y=history_df['accuracy'], mode='lines+markers', name='Training accuracy'))
fig.add_trace(go.Scatter(x=epochs, y=history_df['val_accuracy'], mode='lines+markers', name='Validation accuracy'))
fig.update_layout(title='Training and validation accuracy', xaxis_title='Epochs', yaxis_title='Accuracy', xaxis=dict(tickvals=epochs))
fig.show()

From the graphs, analyzing the performance on the validation set, we can see that our model had the highest accuracy score of 97.4% on epoch number 28 and had the lowest loss of 0.303 on epoch 5. We could also see that on the dataframe.

InΒ [Β ]:
best_model_2 = keras.models.load_model("./models/finetune-vgg16.keras")
best_model_2.summary()
Model: "model_16"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_30 (InputLayer)       [(None, 180, 180, 3)]     0         
                                                                 
 tf.__operators__.getitem_11  (None, 180, 180, 3)      0         
  (SlicingOpLambda)                                              
                                                                 
 tf.nn.bias_add_11 (TFOpLamb  (None, 180, 180, 3)      0         
 da)                                                             
                                                                 
 vgg16 (Functional)          (None, None, None, 512)   14714688  
                                                                 
 flatten_16 (Flatten)        (None, 12800)             0         
                                                                 
 dense_34 (Dense)            (None, 256)               3277056   
                                                                 
 dropout_17 (Dropout)        (None, 256)               0         
                                                                 
 dense_35 (Dense)            (None, 1)                 257       
                                                                 
=================================================================
Total params: 17,992,001
Trainable params: 10,356,737
Non-trainable params: 7,635,264
_________________________________________________________________

EvaluationΒΆ

Model 1 EvaluationΒΆ

AccuracyΒΆ

We will first use the evaluate method that comes with the keras model. It gives us our test loss and the test accuracy.

InΒ [Β ]:
test_loss, test_acc = best_model_1.evaluate(x=test_features, y=test_labels)
print(f"Test accuracy: {test_acc:.3f}")
 1/63 [..............................] - ETA: 1s - loss: 7.8513e-20 - accuracy: 1.0000
63/63 [==============================] - 0s 6ms/step - loss: 5.0883 - accuracy: 0.9700
Test accuracy: 0.970

Now we will use the normal process we have been using to calculate our model performance.

InΒ [Β ]:
y_test_pred = best_model_1.predict(test_features)
 1/63 [..............................] - ETA: 4s
63/63 [==============================] - 0s 5ms/step
InΒ [Β ]:
y_test_pred
Out[Β ]:
array([[1.],
       [1.],
       [0.],
       ...,
       [1.],
       [0.],
       [1.]], dtype=float32)
InΒ [Β ]:
y_test_pred_binary = (y_test_pred > 0.5).astype("int32").flatten()
InΒ [Β ]:
test_labels, y_test_pred_binary
Out[Β ]:
(array([1, 1, 0, ..., 1, 0, 1]), array([1, 1, 0, ..., 1, 0, 1]))
InΒ [Β ]:
accuracy = accuracy_score(test_labels, y_test_pred_binary)
print(f'Calculated Accuracy: {accuracy}')
Calculated Accuracy: 0.97

Confusion MatrixΒΆ

InΒ [Β ]:
conf_mat = confusion_matrix(test_labels, y_test_pred_binary)
print(f'Confusion matrix:\n {conf_mat}')
conf_mat_dis = ConfusionMatrixDisplay(conf_mat, display_labels=['cat', 'dog'])
conf_mat_dis.plot()
conf_mat_dis.ax_.set_title('Confusion Matrix')
plt.show()
Confusion matrix:
 [[975  25]
 [ 35 965]]
No description has been provided for this image

Precision, Recall and F1ΒΆ

InΒ [Β ]:
report = classification_report(test_labels, y_test_pred_binary)
report_df = pd.DataFrame(classification_report(test_labels, y_test_pred_binary, output_dict=True)).T
report_df
Out[Β ]:
precision recall f1-score support
0 0.965347 0.975 0.970149 1000.00
1 0.974747 0.965 0.969849 1000.00
accuracy 0.970000 0.970 0.970000 0.97
macro avg 0.970047 0.970 0.969999 2000.00
weighted avg 0.970047 0.970 0.969999 2000.00

Precision Recall CurveΒΆ

InΒ [Β ]:
precision, recall, thresholds = precision_recall_curve(test_labels, y_test_pred.flatten())
InΒ [Β ]:
threshold = 0.9
InΒ [Β ]:
plt.plot(thresholds, precision[:-1], "b--", label="Precision", linewidth=2)
plt.plot(thresholds, recall[:-1], "g-", label="Recall", linewidth=2)


plt.vlines(threshold, 0, 1.0, "k", "dotted", label="threshold")
idx = (thresholds >= threshold).argmax()  # first index β‰₯ threshold
plt.plot(thresholds[idx], precision[idx], "bo")
plt.plot(thresholds[idx], recall[idx], "go")


plt.xlabel("Threshold")
plt.legend(loc="center right")
plt.grid(True)
plt.show()
No description has been provided for this image
InΒ [Β ]:
plt.plot(recall, precision, "b-", linewidth=2)
plt.xlabel("Recall")
plt.ylabel("Precision")
plt.grid(True)
plt.title('Precision-Recall curve')
plt.show()
No description has been provided for this image
InΒ [Β ]:
predictions_df = pd.DataFrame({
    'Probability': y_test_pred.flatten().round(6),
    'Predicted_Class': y_test_pred_binary,
    'Ground_Truth': test_labels
})

display(predictions_df)
Probability Predicted_Class Ground_Truth
0 1.0 1 1
1 1.0 1 1
2 0.0 0 0
3 0.0 0 0
4 1.0 1 1
... ... ... ...
1995 1.0 1 1
1996 1.0 1 1
1997 1.0 1 1
1998 0.0 0 0
1999 1.0 1 1

2000 rows Γ— 3 columns

InΒ [Β ]:
# Filter rows where Predicted_Class does not match Ground_Truth
mismatch_df = predictions_df[predictions_df['Predicted_Class'] != predictions_df['Ground_Truth']]

display(mismatch_df)
Probability Predicted_Class Ground_Truth
60 1.000000 1 0
119 1.000000 1 0
153 1.000000 1 0
165 0.000000 0 1
166 0.000000 0 1
167 1.000000 1 0
168 1.000000 1 0
172 1.000000 1 0
175 0.000000 0 1
207 1.000000 1 0
234 0.000000 0 1
244 0.000000 0 1
270 0.000000 0 1
317 0.000000 0 1
358 0.000000 0 1
385 0.000000 0 1
406 1.000000 1 0
439 0.999394 1 0
512 0.000000 0 1
540 0.000000 0 1
548 0.000000 0 1
637 0.000000 0 1
644 1.000000 1 0
695 0.000000 0 1
730 0.997206 1 0
763 0.000000 0 1
822 0.000000 0 1
827 0.000000 0 1
855 0.000000 0 1
875 0.000000 0 1
880 0.000000 0 1
896 1.000000 1 0
937 1.000000 1 0
1063 0.000000 0 1
1067 0.000000 0 1
1129 1.000000 1 0
1170 1.000000 1 0
1195 1.000000 1 0
1371 0.000000 0 1
1385 0.000000 0 1
1407 1.000000 1 0
1460 0.982195 1 0
1483 1.000000 1 0
1500 0.000000 0 1
1523 1.000000 1 0
1535 1.000000 1 0
1541 0.000000 0 1
1543 0.000000 0 1
1558 0.000000 0 1
1643 1.000000 1 0
1698 0.999802 1 0
1734 1.000000 1 0
1795 1.000000 1 0
1817 0.000000 0 1
1831 0.000000 0 1
1851 0.000000 0 1
1884 0.000000 0 1
1895 0.000000 0 1
1903 0.000000 0 1
1920 0.000000 0 1
InΒ [Β ]:
len(mismatch_df)
Out[Β ]:
60

We have 60 cases where our model incorrectly predicted the label on our test dataset. We will look into these instaces shortly.

Model 2 EvaluationΒΆ

When we manually use the performance metrics from sklearn, it shuffles our data and we don't want to shuffle the data from the predictions of our finetuned model because it will mess up our score.

To fix this, we store the labels before we do compute the performance scores.

Solution gotten from - https://stackoverflow.com/a/71814816/13959748

InΒ [Β ]:
test_labels = []
test_images = []

for images, labels in test_dataset:
    test_labels.extend(labels.numpy()) 
    test_images.extend(images.numpy())  

test_images_dataset = tf.data.Dataset.from_tensor_slices(test_images).batch(16)

AccuracyΒΆ

InΒ [Β ]:
y_test_pred = best_model_2.predict(test_images_dataset)
125/125 [==============================] - 94s 750ms/step
InΒ [Β ]:
y_test_pred
Out[Β ]:
array([[1.0000000e+00],
       [2.0270474e-05],
       [2.1983158e-18],
       ...,
       [6.3825097e-15],
       [1.0000000e+00],
       [1.0000000e+00]], dtype=float32)
InΒ [Β ]:
y_test_pred_binary = (y_test_pred > 0.5).astype("int32").flatten()
InΒ [Β ]:
manual_accuracy = accuracy_score(test_labels, y_test_pred_binary)
print(f'Manually Calculated Accuracy: {manual_accuracy}')
Manually Calculated Accuracy: 0.956

Here is the inbuilt keras accuracy evaluation and you can see it matches exactly with the one above, so our accuracy computation was correct and we can proceed with our other performance metrics.

InΒ [Β ]:
# Now, the manual accuracy should match the built-in evaluate accuracy
test_loss, test_acc = best_model_2.evaluate(test_dataset)
print(f'Evaluate Accuracy: {test_acc}')
63/63 [==============================] - 100s 2s/step - loss: 0.3981 - accuracy: 0.9560
Evaluate Accuracy: 0.9559999704360962

Confusion MatrixΒΆ

InΒ [Β ]:
conf_mat = confusion_matrix(test_labels, y_test_pred_binary)
print(f'Confusion matrix:\n {conf_mat}')
conf_mat_dis = ConfusionMatrixDisplay(conf_mat, display_labels=['cat', 'dog'])
conf_mat_dis.plot()
conf_mat_dis.ax_.set_title('Confusion Matrix')
plt.show()
Confusion matrix:
 [[962  38]
 [ 50 950]]
No description has been provided for this image

Precision, Recall and F1ΒΆ

InΒ [Β ]:
report = classification_report(test_labels, y_test_pred_binary)
report_df = pd.DataFrame(classification_report(test_labels, y_test_pred_binary, output_dict=True)).T
report_df
Out[Β ]:
precision recall f1-score support
0 0.950593 0.962 0.956262 1000.000
1 0.961538 0.950 0.955734 1000.000
accuracy 0.956000 0.956 0.956000 0.956
macro avg 0.956066 0.956 0.955998 2000.000
weighted avg 0.956066 0.956 0.955998 2000.000

Precision Recall CurveΒΆ

InΒ [Β ]:
precision, recall, thresholds = precision_recall_curve(test_labels, y_test_pred.flatten())
InΒ [Β ]:
threshold = 0.9
InΒ [Β ]:
plt.plot(thresholds, precision[:-1], "b--", label="Precision", linewidth=2)
plt.plot(thresholds, recall[:-1], "g-", label="Recall", linewidth=2)


plt.vlines(threshold, 0, 1.0, "k", "dotted", label="threshold")
idx = (thresholds >= threshold).argmax()  # first index β‰₯ threshold
plt.plot(thresholds[idx], precision[idx], "bo")
plt.plot(thresholds[idx], recall[idx], "go")


plt.xlabel("Threshold")
plt.legend(loc="center right")
plt.grid(True)
plt.show()
No description has been provided for this image
InΒ [Β ]:
plt.plot(recall, precision, "b-", linewidth=2)
plt.xlabel("Recall")
plt.ylabel("Precision")
plt.grid(True)
plt.title('Precision-Recall curve')
plt.show()
No description has been provided for this image
InΒ [Β ]:
predictions_df = pd.DataFrame({
    'Probability': y_test_pred.flatten().round(6),
    'Predicted_Class': y_test_pred_binary,
    'Ground_Truth': test_labels
})

display(predictions_df)
Probability Predicted_Class Ground_Truth
0 1.00000 1 1
1 0.00002 0 0
2 0.00000 0 0
3 0.00061 0 0
4 1.00000 1 1
... ... ... ...
1995 1.00000 1 1
1996 0.00000 0 0
1997 0.00000 0 0
1998 1.00000 1 1
1999 1.00000 1 1

2000 rows Γ— 3 columns

InΒ [Β ]:
mismatch_df = predictions_df[predictions_df['Predicted_Class'] != predictions_df['Ground_Truth']]

display(mismatch_df)
Probability Predicted_Class Ground_Truth
34 0.009803 0 1
86 0.843140 1 0
167 0.000017 0 1
184 1.000000 1 0
192 0.563013 1 0
... ... ... ...
1904 1.000000 1 0
1953 0.000014 0 1
1967 1.000000 1 0
1982 0.001493 0 1
1993 0.999473 1 0

88 rows Γ— 3 columns

InΒ [Β ]:
mismatch_indices = mismatch_df.index.tolist()
len(mismatch_indices)
Out[Β ]:
88

88 cases where the model got the predictions wrong.

InΒ [Β ]:
def visualize_mismatches(dataset, mismatch_indices, max_images=10):
    # Initialize counters
    count, img_count = 0, 0
    for images, labels in dataset.unbatch().batch(1):  # Iterate over individual images
        if count in mismatch_indices:  # Check if the current image is a mismatch
            plt.figure(figsize=(2, 2))
            plt.imshow(images[0].numpy().astype("uint8"))
            plt.title(f'Index: {count}')
            plt.axis('off')
            plt.show()

            img_count += 1
            if img_count >= max_images:
                break
        count += 1
InΒ [Β ]:
visualize_mismatches(test_dataset, mismatch_indices)
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

After going through some of the instances where the model failed, I noticed that in some of those images, the images were either blurry, there was some noise in the image, the animal wasn't facing the camera, the head of the animal wasn't showing properly, there was a human in the image as well, there were multiple animals in the image.

All of the above in different instances were why the model failed. Although the first model was still able to get the correct predictions in some of the cases and the second model also got some cases that the first model failed.

ConclusionΒΆ

After successfully training and evaluating two models, one vanilla, created from scratch and the other a fine tuned VGG16 model.

It was interesting to see that the vanilla model had a better performance on both the validation data and the test data, compared to the VGG16 model. The accuracy and precision difference are quite small but still, the slightly better performance was rather interesting.

The Vanilla model had an accuracy score of 97.0% on the test set while the finetuned VGG6 model had a score of 95.6%.

The Class 0 (Cat) had a lower precision than the Class 1 (Dog) on both models, 96.53% and 95.0% respectively. Similary, or rather, conversely, in terms of recall, the Class 0 (Cat) did better than the other class, scoring 97.5% and 96.2%.